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The dependency structure of multivariate data can be analyzed using the 
covariance matrix S. In many helds the precision matrix is even more in¬ 
formative. As the sample covariance estimator is singular in high-dimensions, 
it cannot be used to obtain a precision matrix estimator. A popular high¬ 
dimensional estimator is the graphical lasso, but it lacks robustness. We 
consider the high-dimensional independent contamination model. Here, even 
a small percentage of contaminated cells in the data matrix may lead to a 
high percentage of contaminated rows. Downweighting entire observations, 
which is done by traditional robust procedures, would then results in a loss 
of information. In this paper, we formally prove that replacing the sample 
covariance matrix in the graphical lasso with an elementwise robust covari¬ 
ance matrix leads to an elementwise robust, sparse precision matrix estimator 
computable in high-dimensions. Examples of such elementwise robust covari¬ 
ance estimators are given. The final precision matrix estimator is positive 
definite, has a high breakdown point under elementwise contamination and 
can be computed fast. 

1 Introduction 

Many statistical methods that deal with the dependence structures of mul¬ 
tivariate data sets start from an estimate of the covariance matrix. For 
observations xi,..., x„ G W with n > p, the classical sample covariance 
matrix 


S (1) 

Tl i . 

t=l 

where x G MP denotes the mean of the data, is optimal in many ways. It 
is easy to compute, maximizes the likelihood function for normal data, is 
unbiased and consistent. However, problems arise when p increases. For 
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p ^ n, the sample covariance matrix has low precision and for p > n it even 
becomes singular, such that the estimated precision matrix © := 5] is not 
computable anymore. This is a problem since there are many fields where 
the precision matrix is needed rather than the covariance matrix. Compu¬ 
tation of Mahalanobis distances or linear discriminant analysis are just two 
examples. The most popular field using precision matrices is probably Gaus¬ 
sian graphical modeling, where the nodes of the graph represent the different 
variables. If an element {&)ij of the estimated precision matrix equals zero, 
the variables i and j are independent given all the other variables, and no 
edge is drawn between the nodes representing variables i and j. Therefore, 
edges correspond to nonzero elements of the precision matrix. As a result, 
the whole graph can be recovered if the support of the precision matrix is 
known. This leads to an increasing interest in sparse precision matrices (pre¬ 
cision matrices with a lot of zero elements) as interpretation of the graph will 
be eased if the number of nonzeros in the precision matrix is kept small. 

The three most suitable techniques to compute sparse precision matrices 
that are also applicable in high dimensions are the graphical lasso (GLASSO) 
Friedman et al.l [20081 , the quadratic approximation method for sparse in¬ 


verse covariance learning (QUIG) [Hsieh et al. 2011] and the constrained 
Li-minimization for inverse matrix estimation (GLIME) [Gai et ^ [20IT| . 
All three methods start from the sample covariance matrix 51 and try to 
minimize a criterion based on the log-likelihood (see Section]^. Since these 
estimators use the nonrobust sample covariance matrix as an input, they are 
only suitable for clean data that do not contain any outliers. 

The problem, however, is that data is rarely clean. Thus, there is need 
for robust procedures. Most robust procedures downweight observations as a 
whole (‘rowwise downweighting’). However, in many statistical applications 
only a limited number of observations are available, while large amounts 
of variables are measured for each observation. Downweighting an entire 
observation because of one single outlying cell in the data matrix results in 
a huge loss of information. Additionally, the contaminating mechanism may 
be independent for different variables. In this case, the probability of having 
an observation without contamination in any cell is decreasing exponentially 
when the number of variables increases. As an example, imagine a data set, 
where each observation contains exactly one contaminated cell. Even though 
there is not a single fully clean observation, each observation still contains 
a lot of clean information. Nonetheless, the ‘classical’ robust procedures 
(that downweight whole observations) cannot deal with a data set like that, 
since they need at least half of the observations to be absolutely clean of 
contamination. This type of ‘cellwise’ or ‘elementwise’ contamination was 
formally described by Alqallaf et al. 2009 , who extend the usual Tukey- 


Huber contamination model (the model that considers whole observations as 
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either outlying or clean). In this more extensive setup, a random vector 


X = (Ip - B)y + Bz 


is observed, where y follows the model distribution and z some arbitrary 
distribution creating contamination, and y, B and z are independent. De¬ 
pending on the Bernoulli random variables Bi with = 1] = that build 
the diagonal matrix B = diag(Bi,..., Bp), different types of outliers are 
created. If all Bi are independent {i = l,...,p), we speak about ‘cellwise 
contamination’. If P[Bi = B 2 = ... = Bp] = 1, rowwise contamination is 
created. Under any type of contamination, the sample covariance matrix 
S is not a good estimator anymore, as it can be distorted by just a single 
outlying observation. 

For robust covariance matrix estimation under rowwise contamination, a 
lot of work has been done. One of the most popular rowwise robust co- 
variance estimators is the minimum covariance determinant [Rousseeuw and 
sen, 1999 . It has a high breakdown point and is very fast to 
compute. However, it is not computable in high-dimensions. Another esti¬ 
mator with very nice theoretical properties is the affine equivariant rank 

It is very efficient and has max- 


covariance matrix Ollila et al., 2003 


imal breakdown point. However, its computation is extremely time con¬ 
suming, especially in high-dimensions. Maronna and Zamar [2002 propose 
a high-dimensional covariance estimator, an orthogonalized version of the 
Gnanadesikan-Kettenring estimate (OGK). Another very simple estimator 
has been developed by Visuri et al. [2000] . Their spatial sign covariance 


matrix appeals through a simple definition and can be computed very fast, 
even in high-dimensions. Very recently, Ollila and Tyler [2014 have intro¬ 
duced a regularized M-estimator of scatter. Under general conditions, they 
show existence and uniqueness of the estimator, using the concept of geodesic 
convexity. 

Much less work has been done for covariance estimation under cellwise 
contamination. A first approach was taken by Van Aelst et al. 2011 , who 


defined a cellwise weighting scheme for the Stahel-Donoho estimator. How¬ 
ever, as for the original estimate, computation times are not feasible for 
larger numbers of variables. A very recent approach by [Agostinelli et ah 
2015 flags cellwise outliers as missing values and applies afterwards a row¬ 


wise robust method that can deal with missing values. By this, it can deal 
with cellwise and rowwise outliers at the same time, but again, computation 
for high-dimensions is not achievable. 

The first step to deal with cellwise outliers in very high-dimensions has 
been taken by Alqallaf et al. 20021. They first compute a pairwise correla¬ 


tion matrix. Afterwards the OGK estimate is applied to obtain a positive 


semidefinite covariance estimate. This method has been hne tuned by Tarr 
[Ms who use pairwise covariances instead of correlations [see also 


et al. 
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Tarr, 2014|. This matrix is then plugged into the graphical lasso (and simi¬ 


lar techniques) instead of the sample covariance matrix, resulting in a sparse 
precision matrix estimate. A very different approach has been taken by |Fine- 
gold and Drton [2011 . Replacing the assumption of Gaussian distribution of 
the data with t-distribution gives more robust results since the t-distribution 
has heavier tails. Assuming a so-called ‘alternative’ t-distribution (see Sec¬ 
tion]^ results in robustness against cellwise contamination. 

In this paper, we consider different high-dimensional precision matrix es¬ 
timators robust to cellwise contamination. Our approach is similar in spirit 

2014j, but we emphasize the difference in 


as 


Tarr et al. 2015 [see also Tarr 


Section [^ We start with pairwise robust correlation estimates from which 
we then estimate a covariance matrix by multiplication with robust standard 
deviations. This cellwise robust covariance matrix replaces then the sample 
covariance matrix in the GLASSO, yielding a sparse, cellwise robust preci¬ 
sion matrix estimator. The different nonrobust precision matrix estimators 
are introduced in Section [2j The cellwise robust covariance matrix estima¬ 
tors are explained in Section We discuss the selection of the regularization 
parameter in Section In Section the breakdown point of the proposed 
precision matrix estimator is derived. Simulation studies are presented in 
Section In Section we discuss possible applications of the proposed 
method and present a real data example. Section [^concludes. 


2 High-dimensional sparse precision matrix estimation 
for clean data 


Recently, a lot of effort has been put into designing estimators and efficient 
routines for high-dimensional precision matrix estimation. We focus here 
on sparse precision matrix estimation, that is, procedures that result in a 
precision matrix containing many zero elements. In this section, we review 
three techniques that start from an estimate of the covariance matrix S and 
then optimize a criterion based on the likelihood function to hnd the precision 
matrix estimate. Since the methods are based on the sample covariance 
matrix, they are only useful if no contamination is present in the data. 

The graphical lasso (GLASSO) [Friedman et al. 2008 maximizes the Li- 
penalized log-likelihood function; 


p 

0gl(X) = argmax logdet(0) - tr(i:0) - l(0)ifc|, (2) 

©^0 

where A 0 denotes a strictly positive definite matrix A and p is a regu¬ 
larization parameter. If the regularization parameter p is equal to zero, the 
solution of the GLASSO is the inverse of the sample covariance matrix. The 
larger the value of p is chosen, the more sparse the precision matrix estimate 


4 































becomes. Since the objective function Q is concave, there exists a unique 
solution. Banerjee et al. 2008 showed that the solution of the GLASSO 


always results in a strictly positive definite estimate 0 gl(X) for any p > 0, 
even if p > n, and this for any positive semidefinite, symmetric matrix 11 in 

The solution @glO^) can be computed via iterative multiple lasso regres¬ 
sion in a block coordinate descent fashion. That means that each column of 
the final estimate is computed separately. Looking at the first order condition 
only for the target column, the equation can be seen as a hrst order condi¬ 
tion of a multiple lasso regression. The GLASSO algorithm loops through 
all columns of the precision matrix iteratively, computing each time the mul¬ 
tiple lasso regression, until convergence of the precision matrix estimate is 
reached. Note that the algorithm does not use the data directly, but only 
uses it indirectly by using the sample covariance matrix. The GLASSO algo¬ 
rithm is implemented in Fortran and available through the R-package glasso 
Friedman et al., 2014 . However, this implementation sometimes encounters 
convergence problems. Therefore, we use in the remainder of this paper. 


the implementation of the GLASSO algorithm in the R-package huge Zhao 
|et al. 2014|, where these convergence issues have been solved 


Another algorithm solving Q is the quadratic approximation method for 
sparse inverse covariance learning (QUIG) [Hsieh et al. 


2011 . Both the 


GLASSO algorithm and the QUIG compute a solution for the same objective 
function. It turned out that QUIG was performing considerably slower in 
high dimensions than the GLASSO implementation in the R-package huge 
2014|, and therefore we will not deal with the former in this 


Zhao et al. 


paper. 

The constrained Li-minimization for inverse matrix estimation (GLIME) 
is defined as 


p p 

01 (X) = argmin |0ij| 

©eRPXP 

0c(X) = (%) with 6ij = 


subject to max |(X0 — Ip)ijj < p 
i=iv,p 


and 0i(X) = (0M 


The result is a symmetric matrix that is positive dehnite with high probabil¬ 
ity. The GLIME estimator 0c(X) converges fast towards the true precision 
matrix under some mild conditions. The algorithm is implemented in the 
R-package clime |Cai et ^ |2012| . Like the GLASSO algorithm, it does 


not use the data directly, but ony requires the sample covariance matrix as 
an input. Replacing the sample covariance matrix with a cellwise robust 
estimator (see Section]^, the resulting estimator is similarly accurate (with 
respect to Kullback-Leibler divergence measure, see Section as the one 
obtained when plugging the cellwise robust estimator into the GLASSO esti¬ 
mator. In some cases, plugging the robust estimator into the CLIME led to 
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slightly better accuracy. However, the computation time, was much higher 
than when plugging it into the GLASSO (for p = 60 the computation time 
was more than 10 times higher). Since in high-dimensional analysis compu¬ 
tation time is important, we will not consider this estimator in the remainder 
of the paper. 


3 Cellwise robust, sparse precision matrix estimators 


We start with computing a cellwise robust covariance matrix S by pairwise, 
robust estimation of the covariances. This cellwise robust covariance matrix 
can then be used to replace the sample covariance matrix in the GLASSO 
estimator (or another sparse precision matrix estimator). This results in 
a sparse, cellwise robust precision matrix estimate. Our approach differs 
from Tarr et al. [2015 in the selection of the initial covariance estimate. We 
estimate robust correlations and standard deviations separately to get the 
robust covariances. The resulting covariance matrix is then always positive 
semidehnite. This leads to a simplihcation of the estimator, increases the 
breakdown point and speeds up computation substantially. 


3.1 Robust covariance matrix estimation based on pairwise 
covariances 


Tarr et al. [2015 use the approach of Gnanadesikan and Kettenring [1972 to 


obtain a robust, pairwise covariance estimate. It is based on the idea that the 
robust covariance of two random variables X and Y can be computed using 
a robust variance. For the population covariance Cov and the population 
variance Var, the following identity holds 


1 


Cov(A, Y) = — [Var(aA + ^Y) - Var(aX - /?T)] 


(3) 


with a = 1/ Y^Var(A), /3 = 1/Y^Var(T). If Var is replaced by a robust 
variance estimator, a robust covariance estimate can be obtained. 

This approach has two drawbacks. Firstly, the addition and subtraction 
of different variables leads to a propagation of the outliers. Therefore, the 
resulting estimator has a breakdown point of less than 25% under cellwise 
contamination. Secondly, the resulting covariance matrix is not necessarily 

(Ms 


positive semidehnite. Therefore, Tarr et al. 


need to apply methods 
that ‘make’ the matrix positive semidehnite to be able to use this covari¬ 
ance matrix estimate as a replacement of the sample covariance matrix in a 
sparse precision matrix estimator. To this end, they use the orthogonalized 
Gnanadesikan-Kettenring (OGK) approach (Maronna and Zamar 2002 


as 


well as the comp utation of the n earest positive (semi)dehnite (NPD) matrix 
as suggested by Higham 2002 . Starting from an estimate S G for 
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the covariance matrix of the data X G NPD finds the closest positive 

semidefinite matrix S to the covariance estimate S in terms of the Frobenius 
norm 


S 


min ||S — S\\f, 

s^o 


where || AH^ = fc=i ^‘jk ^ matrix A = {ajk)j^k=i,...,p £ and A ^ 0 
denotes a positive semidefinite matrix. An algorithm to compute the near¬ 
est matrix S is implemented in the R-package Matrix under the command 
nearPDO. In our simulations, we observed that NPD gave in general better 
results than OGK and could also be computed considerably faster. 


3.2 Robust covariance matrix estimation based on pairwise 
correlations 


In contrast to Tarr et al. 2015 , we use a robust correlation estimator r(-) to 
estimate the pairwise covariance matrix (sjk) G 


Sjk = scale(x-^) scale (x^) r (x-^, x^) j,k = 1,... ,p 


(4) 


from the data X = (x^,...,x^) G W^^P, where scale() is a robust scale 
estimate like the median absolute deviation or the Q^-estimator [Rousseeuw 


and Croux, 1993 . Both estimators are equally robust with a breakdown point 


of 50%. Since the Q„-estimator is more efficient at the Gaussian model and 
does not need a location estimate, we opt for this scale estimate. The amount 
of contamination that the resulting covariance matrix S = {sjk)j,k=i,...,p can 
withstand depends then on the breakdown point of the scale estimator used 
(see Section]^. Using the Q^-scale, we obtain an estimator with a breakdown 
point of 50% under cellwise contamination. 

There are different possibilities for choosing a robust correlation estimator. 
Gaussian rank correlation [e.g. Boudt et al., 2012 is defined as the sample 


correlation estimated from the Van Der Waerden scores (or normal scores) 
of the data 


5'Ga«ss(x^X^) = 


Er=i 




n-l-1 




(5) 


where R{xij) denotes the rank of Xij among all elements of x-^, the jth col¬ 
umn of the data matrix. Similarly R{xik) stands for the rank of Xik among 
all elements of x^. Gaussian rank correlation is robust and consistent at 
the normal model. Still it is asymptotically equally efficient as the sample 
correlation coefficient at normal data. This makes it a very appealing robust 
correlation estimator. Note that the Gaussian rank correlations can easily be 
computed as the sample covariance matrix from the ranks R{xij) of the data. 
Since the sample covariance matrix is positive semidefinite, the covariance 
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matrix S using Gaussian rank correlation is also positive semidefinite. There¬ 
fore, we do not need to apply NPD or OGK to obtain a positive semidefinite 
covariance estimate. This saves computation time and simplifies the final 
precision matrix estimator. 

Another robust correlation estimator is Spearman correlation |Spearman 


1904 . It is dehned as the sample correlation of the ranks of the observations: 




rsp: 


'earman 


(x^x'^) = 


rH-1' 




Spearman correlation is slightly less efficient than Gaussian rank correlation. 
Additionally, it is not consistent at the normal model. To obtain consistency, 
the correlation estimator needs to be non linearly transformed. The trans¬ 
formation, however, destroys the positive semidefiniteness of the estimator 
S, and therefore we do not apply it. In our opinion, the inconsistency is not 
a huge problem because the asymptotic bias of the Spearman correlation is 
at most 0.018 [Boudt et al. 2012 . This is also confirmed by the simulations 
in Section where similar results are obtained with Spearman correlation 
as with Gaussian rank correlation. 

We also consider Quadrant correlation 


Blomqvist, 1950 . Quadrant cor¬ 


relation is defined as the frequency of centered observations in the first and 
third quadrant, minus the frequency of centered observations in the second 
and forth quadrant 


1 

rQuadrant{y^\y^^) = -y^sign ((5 
2=1 


■12 


- m.(idg=i^,„^nXij){xik - med£=i,...,„X£fc)), 


where sign(-) denotes the sign-function. Quadrant correlation is less effi¬ 


cient than Gaussian rank correlation and Spearman correlation Groux and 


Dehon, 2010 . Like Spearman correlation, Quadrant correlation is only con¬ 


sistent at the normal model if a transformation is applied to the correlation 
estimate. The final covariance matrix of the consistent Quadrant correlation 
is no longer positive semidefinite. Since we need a positive semidefinite co- 
variance matrix, we opt for the inconsistent Quadrant correlation. Note that 
the asymptotic bias at the normal distribution of the inconsistent Quadrant 
correlation is substantially higher than for Spearman correlation. Taking 
all this drawback of Quadrant correlation into account, it is not a surprise 
that we obtain worse simulation results with Quadrant correlation than with 
Spearman or Gaussian rank correlation in Section]^ 


3.3 Cellwise robust precision matrix estimation 

To obtain a cellwise robust precision matrix estimator, we adapt the defi¬ 
nition of the GLASSO estimator given in Q. Recall that GLASSO takes 
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the sample covariance estimator as an input and returns a sparse estimate 
of the precision matrix as an output. We will replace the sample covariance 
matrix by the cellwise robust covariance matrices S of Sections |3.1| and 3.2 in 


order to obtain a cellwise robust, sparse precision matrix estimator. Hence, 
we obtain the following estimator 


©s(X) = argmax logdet(0) — tr(S0) — p E 

©=(6»jfe)eKPxp 
©)-0 


( 6 ) 


j,fc=i 


If S is a robust covariance matrix based on pairwise correlations as in Sec¬ 
tion 3.2, we refer to 0s(X) as ‘correlation based precision matrix estimator’. 
If S is a covariance matrix based on pairwise covariances as in Section 3.1 
we call 0s(X) ‘covariance based precision matrix estimator’. Since the al¬ 
gorithm for computing the GLASSO only requires a positive semidefinite, 
symmetric matrix S as an input and not the data, we use it to compute 
©s(X). 

Like for the original GLASSO algorithm, the final precision matrix esti¬ 
mate 0s (X) will always be positive definite as long as the initial covariance 
matrix S is positive semidefinite, even \i p > n. Therefore, it is important 
that the initial covariance estimate S is positive semidefinite. 

The final precision matrix estimator 0s (X) will inherit the breakdown 
point of the initial covariance matrix S (see Section]^. As a result, the 
correlation based precision matrix estimator has a breakdown point of 50% 
under cellwise contamination, while the covariance estimators based on pair¬ 
wise covariances can have a breakdown point of at most 25% under cellwise 
contamination. 

The covariance matrices based on pairwise correlations we considered (i.e. 
the matrices based on Gaussian correlation, Spearman correlation, and Quad¬ 
rant correlation) are all positive semidefinite. Indeed, they can be computed 
as sample correlation matrices of transformed data. For instance, the quad¬ 
rant correlation matrix is a sample correlation matrix of the signs of the 
differences of the observations to their median. In contrast, covariance ma¬ 
trices based on pairwise covariances need to be transformed to be positive 
semidefinite for which we used the NPD method described in Section IQ 
Additionally, all pairwise robust covariances need to be computed according 
to (§, which may become very time consuming. Therefore, the correlation 
based precision matrix estimators are much faster to compute than the co- 
variance based precision matrix estimators. 

To sum up, correlation based precision matrix estimators are faster to 
compute and feature a higher breakdown point under cellwise contamination 
than covariance based precision matrix estimators. 
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4 Selection of the regularization parameter p 


When selecting the regularization parameter p, a good trade-off between a 
high value of the likelihood function and the sparseness of the final precision 
matrix has to be found. The two most common methods to find the optimal 
trade-off are the Bayesian Information Criterion (BIC) and cross validation 
(CV). 

The BIC for a Li-regularized precision matrix estimator 0p for a fixed 
value of p has been given in Yuan and Lin |2007 : 


BICciassicip) = - log det 0p -g tr(0pX:) -g 


log n 


n 


E' 

^<j 


-ij 


ip) 


with S the sample covariance estimate and = 1 if {&p)ij 7 ^ 0 and iij = 0 
otherwise. To obtain a cellwise robust BIC criterion, we replace S by a 
cellwise robust covariance matrix S and use a cellwise robust precision matrix 
estimator ®P- 

BIC{p) = -logdet0p -btr(0pS) -g 

i<j 

Computing the value of BIC over a grid, the value p yielding the lowest BIC 
is chosen. 

To perform iL-fold cross validation, the data first has to be split into K 
blocks of nearly equal size {k = 1,... ,K). Each block k is left out once 
and used as test data On the remaining data, the precision 

'' (~^) 

matrix estimate ®p ' is computed using the regularization parameter p. 
As an evaluation criterion, the negative log-likelihood on the test data is 
computed 

= -logdet0^“"4tr(S('=l0j,""\ 

where is the initial robust covariance estimate computed on the test 
data, i.e. 

= scale(x*(^)) scale(x|^^)r(x*(^),i, j = 1 ,... ,p 

exactly as in Equation Q . By using a robust covariance estimate computed 
from the test data, outliers present in the test data will not affect the cross- 
validation criterion too much. This is done over a range of values of p. 
The value of p minimizing the negative log-likelihood is chosen as the final 
regularization parameter 

1 ^ 

p = argmin — (7) 

^ ^ k=i 
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As pointed out by a referee, it could occure that some of the test data 
sets include a percentage of outliers exceeding the breakdown point of the 
precision matrix estimator, leading to possible breakdown of the cross valida¬ 
tion procedure. In our numerical experiments, with contamination levels low 
compared to the breakdown point and independent for different cells, we did 
not face this problem. Replacing the sum in 0 by a median, for instance, 
may provide a way out. 

To select a grid of values of p, we suggest to use the heuristic approach 
implemented in the huge-package Zhao et al., 2014 . It chooses a logarithmic 


spaced grid of ten values. The largest value of the grid depends on the value 
of the initial covariance matrix S 


Pmax = max max (S - !„)*,• - min (S - lp)ij 

The smallest value of the grid is then a tenth of the largest value pmin = 
O.lpmax- To obtain a logarithmic spaced grid, ten equally spaced values be¬ 
tween log(/9min) and log(/9max) are transformed via the exponential function. 
We will use this grid of p-values in the remainder of the paper. 

In general, the BIC criterion can be computed faster than cross validation. 
However, BIC tends to select too sparse models in practice. In our opin¬ 
ion, the gain in accuracy when using cross validation is worth the increased 
computation time. Therefore, we will use five-fold cross validation in the 
remainder of the paper. 



5 Breakdown point 

In Section we obtain precision matrix estimators by replacing the sample 
covariance matrix in the GLASSO with robust covariance matrices. It is not 
immediately clear if the cellwise robustness of the initial covariance estimator 
trans lates to cellwise robustness of the final precision matrix estimator. The- 
shows that the hnal precision matrix estimator ©s indeed inherits 


5.2 


orem 

the breakdown point of the covariance matrix estimator S. Furthermore, we 
formally show in Proposition |5 . 3| that the proposed initial covariance matrix 
estimators based on pairwise correlations are cellwise robust. 

One of the most common measurements of robustness is the finite-sample 
breakdown point. We refer to Maronna et al. 2006 for the standard defi¬ 


nition, i.e. under rowwise contamination. The breakdown point denotes the 
smallest amount of contamination in the data that drives the estimate to the 
boundary of the parameter space. For example, a location estimator needs to 
stay bounded, a dispersion estimator needs to stay bounded and away from 
zero. More formally, define for any symmetric p x p matrices A and B 


D{A,B) = max{|Ai(A) - Ai(B)|, |Ap(A)-i - Ap(B)-i|}, 
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where the ordered eigenvalues of a matrix A are denoted by 0 < Ap(A) < 
... < Ai(A). We define the finite-sample breakdown point under cellwise 
contamination of a precision matrix estimate 0 as 

yv. Tn A. A. 

en(0,X)= min {— : sup L>(0(X), 0(X™)) = oo}, ( 8 ) 

m=l,...,n n X"* 

where X"^ denotes a corrupted sample obtained from X G by replacing 
in each column at most m cells by arbitrary values. Similarly, we can define 
the explosion finite-sample breakdown point under cellwise contamination of 
a covariance matrix estimate S as 

771 

4(8, X)= niin {—: sup | Ai(S(X)) - Ai(S(X'"))| = cx)}, (9) 

m=l,...,n n y^m 

where X”^ denotes a corrupted sample obtained from X by replacing in each 
column at most m cells by arbitrary values. 

Finally, recall the definition of the explosion breakdown point of a univari¬ 
ate scale estimator scale(-): 

e4scale,x) = min {— : sup scale(x™') = oo}, 

72 -^m 

where x™ is obtained from x G M” by replacing m of the n values by arbitrary 
values. 

To proof the main theorem of this section, we use different properties of 
eigenvalues, which we summarize in the following lemma. 

Lemma 5.1. Let A,B G W^p and denote their smallest (largest) eigenval¬ 
ues by Xp{A) (\i{A)) and Ap(B) ('Ai(B)j, respectively. Then the following 
statements are true: 

(a) If A and B are positive semidefinite, then 


Ap(AB) < Ai(A)Ap(B), 

( 10 ) 

Ap(A)Ap(B) < Ap(AB). 

( 11 ) 

(b) If A and B are symmetric, then 


Ai(A + B) = Ai(A) + Ai(B). 

( 12 ) 

(c) Denoting A = we have 


Ai(A) <p max . 

(13) 


Proof [(a)][Sei^ [2008] 6.76 [(^[Se^ [2008] 6.71, [(c)][Sebm1 [2008] 6.26a □ 


Now, we can show that replacing the sample covariance matrix in the 
GLASSO by a robust covariance matrix S leads to a precision matrix esti¬ 
mator 0s (X) that inherits its robustness from S. 
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Theorem 5.2. The finite sample breakdown point under eellwise contami¬ 
nation of the robust precision matrix estimator ©s(X) fulfills 

en(0s,X) >e+(S,X) (14) 

with S a positive semidefinite eovariance estimator. 


Proof. Let 1 < m < n be the maximum number of cells in a column that have 
been replaced to arbitrary positi ons. Since S(X^) is p ositive semidefinite, 


0s(X™) is positive definite [see [Banerjee et al.| |2008| Theorem 3]. The 
estimate 0s(X”^) needs to fulfill the first order condition of (IM: 


-1 


0 = 0s (X-) - S(X™) - pSign0s(X-), 


(15) 


where (Sign 0s(X™'))jfc = sign ©s(X™')jfc for j,k = 1,... ,p. If ©s has zero 
components, the first order condition (15) corresponds to a subdifferential 


and the sign function at 0 needs to be interpreted as the set [—1,1] [Bertsek^ 


19951. We then obtain 

Ip = (S(X™) +pSign0s(X"*))0s(X™). 

Thus, the smallest eigenvalue fulfills 

1 = Ap(Ip) = Ap((S(X™) + pSign0s(X”*))©s(X"^)). 


Using (10), we get 

1 < Ai(S(X™) + pSign0s(X”*))Ap(0s(X"^)). 

By definition 0s(X™') is always symmetric, therefore also pSign(0s(X”^)). 
As a result, (12) yields 

1 < [Ai(S(X™)) + Ai(pSign0s(X™))]Ap(©s(X™)). 

As 0s(X”^) is positive definite, we obtain 

< Ai(S(X"*))+ pAi(Sign©s(X”^)). 


1 


Ap(0s(X-)) 

From the definition of the Sign-function, we know that |(Sign©s(X™'))jj| < 
1. Together with (13), this yields 


|Ai(Sign0s(X'"))| <p. 


(16) 


resulting in 


Ap(0s(X"^))-i<Ai(S(X”^)) + pp. 


(17) 


13 














From the definition of the explosion breakdown point Q, we know that 
for every rh < ne^ (S,X) there exists an M < oo such that 

Ai(S(X™)) <M + Ai(S(X)). (18) 

Using © in ( jls] ) yields 

0 < Ap(0s(X™))-i < Ai(S(X™)) +pp<M + Ai(S(X)) + pp. 
Together with the triangle inequality this gives 

lApCesCX*^))-! - Ap(©s(X))-^| < Ap(0s(X’^))-i + Ap(0s(X))-i (19) 

< M + Ai(S(X)) + Ap(0s(X))-^ + pp. 

( 20 ) 

To obtain a bound for the largest eigenvalue Ai( 05 (X™')), denote for any 
matrix 0^0 

p 

Q(0,X) = logdet0-tr(S(X)0) -p ^ \9jk\. 

j,k=l 

For the identity matrix, we obtain for contaminated data X™ 

Q(Ip,X™) = 0 - tr(S(X™)) -pp> -pAi(S(X™)) - pp 
since tr(A) = — P"^i(-^) for ^-riy matrix A G Using 


Equation 


this leads to 

g(ip,x™) > -pM - p\i{s{X)) - pp. 


For any matrix 0 0, we obtain with (11) 

p 

tr(S0) = ^Aj(S0) > Ap(S0) > Ap(S)Ap(0) > 0. 
i=i 


( 21 ) 


Furthermore, (13) yields 

p 


1 


V \0jk\ > max lOjkl > -Ai(0). 

j,k=l,...,p p 

*j=i 


( 22 ) 


Equations (21) and (22) lead to 


g(0,X"*) =logdet©-tr(S©)-p ^ |%fc| 

j,k=l 

<plogAi(©)-^Ai(0) 

P 
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because det(A) = 0^=1 ^ Ai(A)p for any matrix A e 

The function x i—)• plogx — px/p is concave and attains its maximum at 
X = p^!p. Therefore, there exists a finite constant M* > p^/p, such that 

plogM* - -M* = -pM - p\i{S(X)) - pp. 

p 

As a results, we know that any matrix 0 with Ai(0) > M* is not optimizing 
(§ since g(Ip,X^) > Q(0,X™). Hence, 

0 < Ai(0s(X™)) < M*. 


Together with the triangular inequality, this yields 


|Ai(0s(X™)) - Ai(0s(X))| < Ai(©s(X-)) + Ai(0s(X)) (23) 

<M* + Ai(0s(X)). (24) 

Thus, ( [T^ and ( [^ lead to 

supI)(0s(X), 0s(X™)) < max{M + Ai(S(X)) + pp + Ap(0s(X))-\ M* + Ai(0s(X))} 

Xm 


for any m < ne^ (S,X), yielding (|14[). 


□ 


We still need to verify that the covariance matrix estimator based on pair¬ 
wise correlations has a high explosion breakdown point under cellwise con¬ 
tamination. 


Proposition 5.3. The explosion breakdown point under cellwise contamina¬ 
tion of the covariance estimator based on pairwise correlations as defined in 
0 depends on the explosion breakdown point of the scale estimator used 

e)(^(S,X)> max (scale, x-^). (25) 

Proof. Using the triangular inequality, and the fact that a correla¬ 

tion has an absolute value smaller than 1 , we obtain 

|Ai(S(X)) - Ai(S(X'"))| < |Ai(S(X))| +p max | scale((X'")g|| scale((X™)'')| 

j,k=l,...,p 


for any m G {1,... ,n}, where denots the jth column of matrix X™', 

and therefore □ 


Note that the explosion breakdown point of the scale estimator in (25) is 
the breakdown point of a univariate estimator. Breakdown points of scale es¬ 
timators have been studied extensively [see e.g. Rousseeuw and Croux, 1993j . 
The median absolute deviation as well as the Qn-estimator have an explo¬ 
sion breakdown point of 50%, resulting in a breakdown point of 50% under 
cellwise contamination for the correlation based precision matrix estimator 
proposed in Section 


15 









6 Simulations 


In this section, we present a simulation study to compare the performance of 
the estimators introduced in Section For the correlation based precision 
matrix estimator, we choose the Qn-estimator as a scale. As robust corre¬ 
lation, we use Gaussian rank correlation, Spearman correlation and Quad¬ 
rant correlation, resulting in the three different estimators ‘GlassoGaussQn’, 
‘GlassoSpearmanQn’ and ‘GlassoQuadQn’, respectively. As a point of refer¬ 
ence, we also include the nonrobust, classical GLASSO Q and abbreviate 
it as ‘GlassoGlass’. Additionally, we compute a covariance based precision 
matrix estimate, where we choose Qn as the scale estimator and NPD to 
obtain a positive semidefinite covariance estimate (‘GlassoNPDQn’). This 
estimator represents the class of estimators studied by |Tarr et al. |2015] . 

To compare to a rowwise, but not cellwise robust estimator that can be 
computed in high dimensions, we consider the spatial sign covariance matrix 


Visuri et al., 2000 


1 " 

SS™“(X) = - C{xj - A)C/(xj - A)T, 

^=1 


(26) 


where U{y) = ||y||^ y if y / 0 and U{y) = 0 otherwise, and ||y ||2 stands 
for the Euclidean norm. The location estimator fi is the spatial median, 
i.e. the minimizer of 11^* ~ /^Ib- Since only the eigenvectors of (26) 


are consistent estimators for the eigenvectors of the covariance matrix at the 
normal model, we still need to compute consistent eigenvalues. Let U denote 


the matrix of eigenvectors of (26). The eigenvalues of the covariance matrix 
are then given by the marginal variances of U''~xi,... To robustly 

estimate these marginal variances, we use the robust scale estimator Qn- 


Denote the matrix of robust eigenvalues as A = diag(Ai, 
consistent spatial sign covariance matrix is 


,Xp). Then the 


Ss^gn{^) = UAU ' 


The spatial sign covariance matrix is positive semidefinite. Therefore, we 
use it as an input in the GLASSO, as in Equation (|^, to obtain a sparse 
precision matrix estimate which is robust under rowwise contamination. We 
refer to this precision matrix estimator as ‘GlassoSpSign’. Einally, we also 
add the inverse of the classical sample covariance matrix ([^ as a benchmark 
(‘Classic’), where it can be computed. For all estimators, we select the regu¬ 
larization parameter p via five-fold cross validation over a logarithmic spaced 
grid (see Section]^. 


Sampling schemes. We use in total four sampling schemes covering the 
scenarios of a banded precision matrix, a sparse precision matrix, a dense 
precision matrix Cai et al., 2011 and a diagonal precision matrix. Each 
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sampling scheme is defined through the true precision matrix 0o G for 

i,j = 1, • • ■ 

• ‘banded’: {&o)ij = 

• ‘sparse’: ©o = B + 61p with F[bij = 0.5] = 0.1 and = 0] = 0.9 for 
i ^ j. The parameter <5 is chosen such that the conditional number of 
©0 equals p. Then the matrix is standardized to have unit diagonals. 

• ‘dense’: (©o)ii = 1 and {Qo)ij = 0.5 for i j 

• ‘diagonal’: (0o)n = 1 and (©o)ii = 0 for i / j 

For each sampling scheme, we generate M = 100 samples of size n = 100 
from a multivariate normal AA(0,©g^). We take as dimension p = 60 and 

p = 200. 


Contamination. To simulate contamination, we use two different con¬ 
tamination settings Finegold and Drton, 2011| : (i) To every generated data 
set, we add 5 or 10% of cellwise contamination. Therefore, we randomly 
select 5 and 10% of the cells and draw them from a normal AA(10,0.2). (ii) 

To simulate model deviation, we draw all observations from an alternative 
t-distribution 2(0, ©g of dimension 100 with 2 degree of freedom. 

Recall that a multivariate t-distributed random variable x ~ tn,!/(0, T') is 
defined as a multivariate normally distribnted random variable y = (pi,..., yp)~^ 
Mp{0,^) divided by a gamma distributed variable r F(zz/2,zz/2) 


X = 


y 




To obtain an alternative t-distributed random variable x = (xi,... ,Xp)^ ~ 
we draw p independent divisors Tj ~ F(zz/2, zz/2) for the different 
variables j = 1,... ,p 


Xn = 



The heaviness of the tails is then different for different variables of x. 

Performance measures. We assess the performance of the estimators 


using the Kullback-Leibler divergence Biihlmann and van de Geer, 2011 
p437] 

KL{&, © 0 ) = tr(©Q ^©) — logdet(©g ^0) — p. 


It measures how close the obtained estimate 0 is to the true parameter ©q. 
Lower values represent a better estimate. If the estimator is equal to the 
true precision matrix, the Kullback-Leibler distance is equal to zero. The 
less accurate the precision matrix is estimated, the higher the value of the 
Kullback-Leibler distance becomes. 
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To measure how well the sparseness of the true precision matrix is recov¬ 
ered, we also look at false positive (FP) and false negative (FN) rates: 

pp ^ |{(i, j) : ^ = 1,.. ■ ,n; j = 1,. ■ ■ : (0)^^ / 0 A = 0}| 

|{(z,j) : z = 1,... ,n; j = 1,... : (&o)ij = 0 }| 

pjy ^ |{(i, j) : i = 1,... ,n;j = 1,... ,p : (0)ij = 0 A (0o)ij / 0}| 

|{(i,j) : i = : (©o)ij / 0 }| 

The false positive rate gives the percentage of zero-elements in the true pre¬ 
cision matrix that are wrongly estimated as nonzero. In contrast, the false 
negative rate gives the percentage of nonzero-elements in the true precision 
matrix that are wrongly estimated to be zero. Both values are desired to 
be as small as possible. However, a large false negative rate has a worse 
impact since it implies that associations between variables are not found and 
therefore important information is not used. A large false positive rate in¬ 
dicates that unnecessary associations are included, which ‘only’ complicates 
the model. Note that if 0o does not contain any zero-entries, the false pos¬ 
itive rate is not dehned. In graphical modeling, a high false negative rate 
indicates that many non-zero edges that should be included in the estimated 
graph are missed. This implies that there are conditional independencies 
assumed which are not supported by the true graph. 

Simulation results. Results for p = 60 are given in Table For clean 
data in the banded scenario, the classical GLASSO (‘GlassoClass’) is per¬ 
forming best, achieving lowest values of KL. Only marginally higher values 
of KL are obtained by the correlation based precision matrix using Gaussian 
rank correlation (‘GlassoGaussQn’) and the regularized spatial sign covari¬ 
ance matrix (‘GlassoSpSign’). Their good performance can be explained by 
their high efficiency at the normal model. Even though this data is clean, the 
inverse of the sample covariance matrix (‘Classic’) is performing very poorly. 
This is due to the low precision of the sample covariance matrix for a data 
set with p > n/2. Regularization of the inverse of the sample covariance ma¬ 
trix is solving the problem, as we see from the classical GLASSO. Note that 
the sample covariance matrix always gives an FN of zero, since the resulting 
estimate is not sparse, and should therefore not be considered to evaluate 
the performance of the sample covariance matrix. The correlation based pre¬ 
cision matrix using Spearman correlation (‘GlassoSpearmanQn’) obtains a 
slightly higher value of KL than ‘GlassoGaussQn’. It probably suffers from 
its inconsistency. This also explains why the KL of the correlation based 
precision matrix using Quadrant correlation (‘GlassoQuadQn’) is so much 
higher, since the asymptotic bias of the Quadrant correlation is considerably 
higher than that of Spearman. The performance of the covariance based 
precision matrix (‘GlassoNPDQn’) lies in between ‘GlassoSpearmanQn’ and 
‘ GlassoQuadQn’. 
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Under contamination, the relative performance of the different estimators 
changes. Clearly, the classical GLASSO is not robust, and it achieves the 
highest values of KL of all estimators. Also the regularized spatial sign co- 
variance matrix does not perform well. This is no surprise since for 5% of 
cellwise contamination, already more than 90% of the observations are ex¬ 
pected to be contaminated. Thus, the level of rowwise contamination is too 
high for ‘GlassoSpSign’ to obtain reliable results. Best performance under 
contamination is obtained by the correlation based precision matrices using 
Gaussian rank or Spearman correlation. They give lowest values of KL for 
all three contamination schemes. Moderately larger values are obtained by 
‘GlassoQuadQn’. Of the cellwise robust estimators, the covariance based 
precision matrix estimator is performing worst under contamination. It ob¬ 
tains highest values of KL and FN in all three contamination settings. Under 
10% of cellwise contamination the value of KL of ‘GlassoNPDQn’ is nearly 
double that of ‘GlassoSpearmanQn’. 

Looking at the other three sampling schemes ‘sparse’, ‘dense’ and ‘diag¬ 
onal’, the conclusions are very similar to that of the banded scheme: For 
clean data ‘GlassoGlass’ is doing best, closely followed by ‘GlassoGaussQn’ 
and ‘GlassoSpSign’. Under contamination ‘GlassoGaussQn’ and ‘GlassoS¬ 
pearmanQn’ are performing best, while ‘GlassoNPDQn’ gives worst results 
of all cellwise robust estimators. For the sparse settings ‘sparse’ and ‘diag¬ 
onal’ we also compare the different values of the FP and FN. In the setting 
‘diagonal’ the values are more or less the same for all estimators (apart from 
the sample covariance matrix which does not give sparse results and therefore 
has a FP equal to one). In the setting ‘sparse’, differences are more outspo¬ 
ken. The covariance based precision matrix estimator gives a FN of up to 
double that of ‘GlassoGaussQn’ or ‘GlassoSpearmanQn’, which is not made 
up by the slightly lower value of FP. In graphical modeling that means that 
many nonzero edges are missed by ‘GlassoNPDQn’, while they are correctly 
identified by ‘GlassoGaussQn’ and ‘GlassoSpearmanQn’. 

The simulation results for p = 200 are given in Table Since p > n, the 
sample covariance matrix cannot be inverted anymore and is excluded from 
the analysis. Overall, the conclusions are similar to p = 60. For clean data, 
the classical GLASSO performs best. Marginally larger values of KL are 
obtained by ‘GlassoGaussQn’ and ‘GlassoSpSign’. In comparision to p = 60, 
here also ‘GlassoSpearmanQn’ is doing very well for clean data. 

For p = 200, we see again that under any type of contamination the clas¬ 
sical GLASSO and the regularized spatial sign covariance matrix are not 
reliable any more. In contrast, the cellwise robust correlation based preci¬ 
sion matrix estimators achieve very good results, especially in combination 
with Gaussian rank or Spearman correlation. Their KL as well as their FN 
are lowest of all estimators for all settings considered here. The covariance 
based correlation estimate is considerably less accurate than the correlation 
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Table 1: Simulation results for n = 100 and p = 60: Kullback-Leibler criterion (KL), 
false positive rate (FP) and false negative rate (FN) averaged over M = 100 
simulations reported for 7 estimators and 4 sampling schemes 




clean 

5% cellwise 

10% cellwise 

alternative t 



KL 

FP 

FN 

KL 

FP 

FN 

KL 

FP 

FN 

KL 

FP 

FN 


GlassoClass 

8.97 


.70 

55.00 


.95 

77.11 


.94 

143.16 


.98 


GlassoQuadQn 

14.96 


.83 

19.20 


.86 

24.44 


.90 

31.10 


.87 


GlassoGaussQn 

9.62 


.75 

16.91 


.83 

23.52 


.88 

28.41 


.84 

banded 

GlassoSpearmanQn 

10.09 


.76 

16.32 


.83 

22.69 


.87 

27.92 


.84 


GlassoNPDQn 

11.90 


.85 

21.73 


.91 

43.59 


.97 

37.34 


.92 


Classic 

71.54 


.00 

49.24 


.00 

61.01 


.00 

67.84 


.00 


GlassoSpSign 

9.53 


.74 

53.92 


.96 

77.54 


.95 

80.41 


.94 


GlassoClass 

5.87 

.23 

.09 

63.70 

.02 

.82 

88.81 

.04 

.81 

140.09 

.00 

.85 


GlassoQuadQn 

10.28 

.15 

.38 

14.20 

.12 

.47 

19.04 

.09 

.56 

26.28 

.10 

.45 


GlassoGaussQn 

6.34 

.21 

.11 

12.25 

.16 

.30 

18.39 

.11 

.49 

24.09 

.13 

.28 

sparse 

GlassoSpearmanQn 

6.74 

.21 

.13 

11.75 

.16 

.27 

17.67 

.12 

.43 

23.71 

.14 

.26 


GlassoNPDQn 

8.25 

.13 

.23 

17.85 

.06 

.47 

42.14 

.01 

.82 

32.73 

.06 

.52 


Classic 

71.54 

1.00 

.00 

49.39 

1.00 

.00 

66.83 

1.00 

.00 

62.79 

1.00 

.00 


GlassoSpSign 

6.35 

.21 

.11 

62.36 

.01 

.83 

89.37 

.02 

.83 

76.37 

.04 

.65 


GlassoClass 

4.40 


.92 

42.52 


.96 

64.95 


.94 

128.00 


.98 


GlassoQuadQn 

4.65 


.94 

7.66 


.95 

11.66 


.96 

20.72 


.97 


GlassoGaussQn 

4.59 


.93 

7.59 


.94 

11.70 


.96 

20.72 


.96 

dense 

GlassoSpearmanQn 

4.61 


.94 

7.59 


.94 

11.80 


.96 

20.81 


.97 


GlassoNPDQn 

5.01 


.96 

13.69 


.98 

30.98 


.98 

28.43 


.98 


Classic 

71.54 


.00 

39.88 


.00 

49.44 


.00 

59.08 


.00 


GlassoSpSign 

4.62 


.94 

41.65 


.97 

65.21 


.96 

69.46 


.98 


GlassoClass 

1.31 

.05 

.00 

66.11 

.01 

.00 

93.48 

.03 

.00 

124.03 

.00 

.00 


GlassoQuadQn 

1.55 

.04 

.00 

4.60 

.03 

.00 

8.68 

.03 

.00 

17.69 

.02 

.00 


GlassoGaussQn 

1.53 

.04 

.00 

4.54 

.04 

.00 

8.67 

.03 

.00 

17.61 

.02 

.00 

diagonal 

GlassoSpearmanQn 

1.55 

.04 

.00 

4.57 

.04 

.00 

8.68 

.03 

.00 

17.78 

.02 

.00 


GlassoNPDQn 

1.92 

.02 

.00 

11.02 

.00 

.00 

33.94 

.00 

.00 

25.46 

.00 

.00 


Classic 

71.54 

1.00 

.00 

48.41 

1.00 

.00 

68.67 

1.00 

.00 

56.26 

1.00 

.00 


GlassoSpSign 

1.54 

.04 

.00 

62.99 

.01 

.00 

93.75 

.02 

.00 

66.05 

.01 

.00 


based estimates. Under higher amounts of cellwise contamination ‘GlassoN- 
PDQn’ can have a KL of more than four times the value of the correlation 
based precision matrix estimators. Besides, its FN is higher in all settings 
considered. 

Since in high-dimensional analysis computation time is important for prac¬ 
tical usage of the estimators, Table gives an overview of the average com¬ 
putation time that the different estimators require. The computation time 
was comparable throughout the different simulation schemes. Therefore, we 
only give averages. Note that the reported computation time includes the 
selection of p via 5-fold crossvalidation. For p = 60, the correlation based 
precision matrices, the classical GLASSO and the regularized spatial sign 
covariance matrix need very similar computation times. This indicates that 
the GLASSO algorithm takes most of the computation time and that the 
computation time of the initial covariance matrices is negligible. In contrast, 
the covariance based precision matrix estimator is nearly four times slower. 
For p = 200, the classical GLASSO and the regularized spatial sign covari- 
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Table 2: Simulation results for n = 100 and p = 200: Kullback-Leibler criterion (KL), 
false positive rate (FP) and false negative rate (FN) averaged over M = 100 
simulations reported for 7 estimators and 4 sampling schemes 




clean 

5% cellwise 

10% cellwise 

alternative t 



KL 

FP 

FN 

KL 

FP 

FN 

KL 

FP 

FN 

KL 

FP 

FN 


GlassoClass 

38.32 


.89 

187.21 


.98 

262.04 


.98 

Inf 


.99 


GlassoQuadQn 

56.42 


.94 

70.67 


.95 

86.97 


.97 

112.04 


.96 


GlassoGaussQn 

40.18 


.91 

63.97 


.94 

84.67 


.96 

103.21 


.94 

banded 

GlassoSpearmanQn 

41.53 


.90 

61.90 


.93 

82.92 


.96 

101.81 


.93 


GlassoNPDQn 

Classic 

52.42 


.96 

102.91 


.98 

200.13 


.99 

164.86 


.99 


GlassoSpSign 

39.68 


.90 

189.47 


.98 

265.32 


.98 

326.08 


.99 


GlassoClass 

46.65 

.10 

.52 

220.90 

.02 

.93 

302.94 

.02 

.93 

Inf 

.00 

.95 


GlassoQuadQn 

60.42 

.09 

.72 

75.70 

.07 

.77 

93.11 

.05 

.81 

119.80 

.06 

.78 


GlassoGaussQn 

48.45 

.10 

.54 

69.70 

.08 

.69 

90.69 

.06 

.79 

115.29 

.06 

.71 

sparse 

GlassoSpearmanQn 

49.60 

.10 

.56 

68.27 

.08 

.67 

88.92 

.06 

.76 

114.46 

.06 

.70 


GlassoNPDQn 

Classic 

58.64 

.07 

.64 

111.15 

.03 

.84 

215.95 

.00 

.95 

167.81 

.02 

.85 


GlassoSpSign 

47.97 

.10 

.54 

223.28 

.01 

.93 

306.49 

.01 

.94 

339.53 

.01 

.93 


GlassoClass 

9.70 


.97 

137.73 


.98 

214.08 


.98 

Inf 


.99 


GlassoQuadQn 

10.41 


.98 

21.12 


.98 

35.06 


.98 

66.07 


.99 


GlassoGaussQn 

10.35 


.98 

20.91 


.98 

35.06 


.98 

65.54 


.99 

dense 

GlassoSpearmanQn 

10.39 


.98 

21.11 


.98 

34.92 


.98 

65.80 


.99 


GlassoNPDQn 

Classic 

15.27 


.99 

65.43 


.99 

146.61 


.99 

121.14 


.99 


GlassoSpSign 

10.53 


.98 

140.17 


.99 

217.08 


.98 

270.10 


.99 


GlassoClass 

5.41 

.02 

.00 

224.15 

.01 

.00 

317.07 

.01 

.00 

Inf 

.00 

.00 


GlassoQuadQn 

6.14 

.02 

.00 

17.12 

.01 

.00 

30.71 

.01 

.00 

61.51 

.01 

.00 


GlassoGaussQn 

6.05 

.02 

.00 

17.15 

.01 

.00 

30.66 

.01 

.00 

61.37 

.01 

.00 

diagonal 

GlassoSpearmanQn 

6.07 

.02 

.00 

17.08 

.01 

.00 

30.71 

.01 

.00 

61.18 

.01 

.00 


GlassoNPDQn 

Classic 

10.83 

.00 

.00 

63.60 

.00 

.00 

167.73 

.00 

.00 

114.93 

.00 

.00 


GlassoSpSign 

6.29 

.01 

.00 

225.91 

.01 

.00 

320.17 

.01 

.00 

265.02 

.00 

.00 


ance matrix can be computed fastest. But as they are not robust enough, 
the estimates are very inaccurate. Computation of the correlation based es¬ 
timators is still very fast here. The estimation including the selection of p 
over a grid of ten values takes less than 10 seconds. In contrast, estimation 
of the covariance based precision matrix takes more than 20 times longer. 

Since we advertise the high breakdown point of the correlation based pre¬ 
cision matrix estimators, we also look at the performance of the estimators 
under higher amounts of cellwise contamination, ranging from 0 to 40%. 
Fig.[T] plots the value of KL for the most representative precision matrix 
estimators for p = 60 (left panel) and p = 200 (right panel), following the 
‘banded’ sampling scheme. As expected, the nonrobust ‘GlassoClass’ results 
in the highest values of KL. For higher amounts of cellwise contamination, 
the KL of the ‘GlassoNPDQn’ deteriorates quickly. This is in contrast with 
the more robust ‘GlassoGaussQn’, where the KL measure remains limited 
for higher contamination levels, both for p = 60 and p = 200. The results 
for the sampling schemes ‘sparse’, ‘dense’ and ‘diagonal’ are comparable to 
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Table 3: Computation time (in sec.) for samples of size n = 
of p via 5-fold cross-validation) averaged over M = 
simulation schemes reported for 7 estimators 



p = 60 

p = 200 

GlassoClass 

5.93 

7.69 

GlassoQuadQn 

6.13 

9.12 

GlassoGaussQn 

6.09 

9.15 

GlassoSpearmanQn 

5.82 

9.01 

GlassoNPDQn 

22.85 

216.79 

Glassic 

0.00 

0.02 

GlassoSpSign 

5.73 

8.11 


100 (including selection 
100 simulations and all 


Fig.[T] and are therefore omitted. 

To summarize, for clean data the classical GLASSO performs best. Under 
cellwise contamination, ‘GlassoGaussQn’ and ‘GlassoSpearmanQn’ achieve 
best results. All three estimators can be computed equally fast. Since the 
‘GlassoGaussQn’ is consistent and performs similarly well as the classical 
GLASSO for clean data, we advise the ‘GlassoGaussQn’ for high-dimensional 
sparse precision matrix estimation under cellwise contamination. 


7 Applications 


In this paper, we describe how a cellwise robust, sparse precision matrix 
estimator can be obtained. To show the applicability of the introduced es¬ 
timator to a real world data set, we use the dataset stockdata, which is 
publicly available through the R-package huge |Zhao et al. 2014 . It consists 
of the closing prices of p = 452 stocks in the S&P on all trading days between 
January 1,2003 and January 1, 2008, leading to n = 1258 observations. We 


use the same data transformations and parameter choices as in Zhao et al. 
2012 . The estimated graphical models returned by ‘GlassoClass’ and ‘Glas¬ 


soGaussQn’ are visualized in Panel (a) and (b) of Fig. From the plots, we 
can conclude that the two graphs are very similar. Indeed, only around 2% 
of the selected edges in ‘GlassoClass’ are not selected in ‘GlassoGaussQn’, 
while the percentage is even smaller vice versa. As a result, we assume that 
stockdata is a rather clean data set. 

To see how the estimators behave under contamination, we randomly select 
5% of the cells of the data matrix and replace them by replicates of the 
normal distribution AA(10,0.2). The graphs estimated by ‘GlassoClass’ and 
‘GlassoGaussQn’ from the contaminated data are shown in Panels (c) and 
(d) of Fig. respectively. While the graph estimated by ‘GlassoGaussQn’ 
hardly differs from the uncontaminated case, ‘GlassoClass’ estimates a graph 
without any edges. Thus, ‘GlassoGaussQn’ is robust in the sense that the 
estimate on the contaminated data resembles that of the clean data. In 
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Kullback-Leibler criterion 


p = 60 and n = 100 


p = 200 and n = 100 



0 % 5 % 10 % 15 % 20 % 25 % 30 % 35 % 40 % 


0 % 5 % 10 % 15 % 20 % 25 % 30 % 35 % 40 % 


Cellwise contamination 


Cellwise contamination 


Figure 1: Kullback-Leibler criterion for the ‘banded’ sampling scheme averaged over 
M = 100 simulations reported for various amounts of cellwise contamination 
and several estimators 

contrast, the nonrobust ‘GlassoClass’ returns a not reliable estimate in the 
presence of cellwise contaminated data. 

Estimating a cellwise robust, sparse precision matrix is not only interesting 
in graphical models. As an example consider linear discriminant analysis, 
where each observation belongs to one of K groups. The goal is then to 
assign a new observation x G to one of those K groups. Assuming a 
normal distributions for observations of group /c G {1,..., K}, the 

Bayes optimal solution is found via the linear discriminant function 



where vTfc is the a priori probability of belonging to group k. Replacing 
with the correlation based precision matrix estimated from the centered data 
(where each observation is centered by the coordinatewise median computed 
over the observations belonging to the same group) results in a cellwise ro¬ 
bust estimator for high-dimensional linear discriminant analysis. The final 
estimate may not be sparse anymore, but it is very robust under cellwise 
contamination. Furthermore, it can be computed even if p > n. 

Cellwise robust, sparse precision matrix estimation can also be used to 
obtain cellwise robust, sparse regression of y G MT" on X G Partitioning 
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data 

contaminated data 

‘GlassoClass’ 

‘GlassoGaussQn’ 

‘GlassoGlass’ 

‘GlassoGaussQn’ 

W". 



: 

(a) 

(b) 

(c) 

(d) 


Figure 2: Graphical models estimated from stockdata. Every node in the graph corre¬ 
sponds to one of the p = 452 stocks. 


the joint sample covariance estimate of (X, y) and its inverse into 

f ©XX ^Xy\ 


X = 


_ /^Xxx dxy 
yy 


V '^Xy 


0 = 




Xy 


9- 


yy 


the least squares estimator can be rewritten as 


PlS = ^XX^Xy = -^©Xy 
^yy 


using the partitioned inverse formula Seber[ 2008, 14.11], With the cor¬ 
relation based precision matrix estimate 0s((X, y)) computed jointly from 
(X,y), we obtain a cellwise robust, sparse regression estimate computable in 
high-dimensions 


/3 = - 


(0s((X,y)))p+i,p+i 


(0s((X,y)))i,p,p+i. 


8 Conclusions 


We have introduced a cellwise robust, correlation based precision matrix 
estimator. We put forward the following simple procedure: (i) compute 
the robust scale estimators Qn for each variable (ii) compute the robust 
correlation matrix from the normal scores, as in Equation ([^ (hi) construct 
then the robust covariance matrix from these correlations and robust scale, 
as in Equation Q (iv) use the latter as input for the GLASSO, returning 
0s (X). It is formally shown that the proposed estimator features a very 
high breakdown point under cellwise contamination. As its definition is very 
simple, the estimator can be computed very fast, even in high-dimensions. 

The simulation results presented in Section discuss the results of the 
various estimators including their selection of the regularization parame¬ 
ter p. As can be seen from a small simulation study with p = 60, kindly 


24 











provided by a referee, the bad performance of ‘GlassoNPDQn’ needs to be 
mainly attributed to the selection of p. When ‘GlassoNPDQn’ is run with 
the regularization parameter estimated by ‘GlassoGaussQn’, the two meth¬ 
ods performed similar. This problem also occurred for clean data. Replacing 
GV by BIG did not help to improve ‘GlassoNPDQn’: The performance in 
comparison to ‘GlassoGaussQn’ was still similar as in Tables and An¬ 
alyzing the reason for the bad performance of ‘GlassoNPDQn’ with respect 
to the selection of the regularization parameter is left for future research. 

Compared to the covariance based approach, a correlation based approach 
results in a simpler estimator. More importantly, it achieves a substantially 
higher breakdown point, is considerably faster to compute and yields more 
accurate estimates when the regularization parameter is selected using BIG 
or the new cross-validation criterion presented in Section 

Acknowledgments 

The authors wish to acknowledge the helpful comments from two anony¬ 
mous referees. The first author gratefully acknowledges support from the 
GOA/12/014 project of the Research Fund KU Leuven. 

References 

C. Agostinelli, A. Leung, V.J. Yohai, and R.H. Zamar. Robust estimation of 
multivariate location and scatter in the presence of cellwise and casewise 
contamination. To appear in TEST, 2015. http://arxiv.org/abs/1406.6031. 

F.A. Alqallaf, K.P. Konis, R.D. Martin, and R.H. Zamar. Scalable robust 
covariance and correlation estimates for data mining. In Proceedings of the 
Eighth ACM SIGKDD International Conference on Knowledge Discovery 
and Data Mining, pages 14-23. ACM, 2002. 

F.A. Alqallaf, S. Van Aelst, V.J. Yohai, and R.H. Zamar. Propagation of 
outliers in multivariate data. The Annals of Statistics, 37(1):311-331, 2009. 

O. Banerjee, A. d’Aspremont, and L. El Ghaoui. Model selection through 
sparse maximum likelihood estimation for multivariate Gaussian or binary 
data. Journal of Machine Learning Research, 9:485-516, 2008. 

B. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 
1995. 

N. Blomqvist. On a measure of dependence between two random variables. 
The Annals of Mathematical Statistics, 21(4):593-600, 1950. 

K. Boudt, J. Cornelissen, and C. Croux. The Gaussian rank correlation 
estimator: robustness properties. Statistics and Computing, 22(2) :471- 
483, 2012. 


25 


P. Biihlmann and S. van de Geer. Statistics for high-dimensional data. 
Springer, Heidelberg, 2011. 

T.T. Cai, W. Liu, and X. Luo. A constrained h minimization approach to 
sparse precision matrix estimation. Journal of the American Statistical 
Association, 106(494):594-607, 2011. 

T.T. Cai, W. Liu, and X. Luo. clime: Constrained LI-minimization 
for Inverse (covariance) matrix estimation, 2012. URL http://GRAN. 
R-project .org/package=clime, R package version 0.4.1. 

C. Croux and C. Dehon. Influence functions of the Spearman and Kendall 
correlation measures. Statistical Methods & Applications, 19(4):497-515, 
2010. 

M. A. Finegold and M. Drton. Robust graphical modeling of gene networks us¬ 
ing classical and alternative t-distributions. The Annals of Applied Statis¬ 
tics, 5(2A):1057-1080, 2011. 

J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estima¬ 
tion with the graphical lasso. Biostatistics, 9(3):432-441, 2008. 

J. Friedman, T. Hastie, and R. Tibshirani. glasso: Graphical lasso estimation 
of Gaussian graphical models, 2014. URL http://CRAN.R-project.org/ 
package=glasso. R package version 1.8. 

R. Gnanadesikan and J.R. Kettenring. Robust estimates, residuals and out¬ 
lier detection with multiresponse data. Biometrics, 28(1):81-124, 1972. 

N. J. Higham. Computing the nearest correlation matrix - a problem from 
finance. IMA Journal of Numerical Analysis, 22(3):329-343, 2002. 

C.J. Hsieh, M.A. Sustik, I.S. Dhillon, and Ravikumar P. Sparse inverse 
covariance matrix estimation using quadratic approximation. In Ad¬ 
vances in Neural Information Processing Systems 24, pages 2330-2338. 
http://nips.cc/, 2011. 

R.A. Maronna and R.H. Zamar. Robust estimation of location and dispersion 
for high-dimensional datasets. Technometrics, 44(4):307-317, 2002. 

R.A. Maronna, R.D. Martin, and V.J. Yohai. Robust Statistics. John Wiley 
&: Sons, Hoboken, New Jersey, 2nd edition, 2006. 

E. Ollila and D. Tyler. Regularized M-estimators of scatter matrix. IEEE 
Transactions on Signal Processing, 62(22) :6059-6070, 2014. 

E. Ollila, H. Oja, and C. Croux. The affine equivariant sign covariance ma¬ 
trix: Asymptotic behavior and efficiencies. Journal of Multivariate Anal¬ 
ysis, 87(2):328-355, 2003. 


26 



P.J. Rousseeuw and C. Croux. Alternatives to the median absolute deviation. 
Journal of the American Statistical Association, 88(424):1273-1283, 1993. 

P.J. Rousseeuw and K. Van Driessen. A fast algorithm for the minimum 
covariance determinant estimator. Technometrics, 41(3):212“223, 1999. 

G.A.F. Seber. A matrix handbook for Statisticians. John Wiley k. Sons, 
Hoboken, New Jersey, 2008. 

C. Spearman. The proof and measurement of association between two things. 
American Journal of Psychology, 15(1):72-101, 1904. 

G. Tarr. Quantile based estimation of scale and dependence. PhD thesis. 
University of Sydney, 2014. 

G. Tarr, S. Miiller, and N.C. Weber. Robust estimation of precision matrices 
under cellwise contamination. Computational Statistics & Data Analysis, 
2015. doi: http://dx.doi.Org/10.1016/j.csda.2015.02.005. 

S. Van Aelst, E. Vandervieren, and G. Willems. Stahel-donoho estimators 
with cellwise weights. Journal of Statistical Computation and Simulation, 
81(l):l-27, 2011. 

S. Visuri, V. Koivunen, and H. Oja. Sign and rank covariance matrices. 
Journal of Statistical Planning and Inference, 91(2):557-575, 2000. 

M. Yuan and Y. Lin. Model selection and estimation in the gaussian graphical 
model. Biometrika, 94(l):19-35, 2007. 

T. Zhao, H. Liu, K. Roeder, J. Lafferty, and L. Wasserman. The huge package 
for high-dimensional undirected graph estimation in R. Journal of Maehine 
Learning Research, 13:1059-1062, 2012. 

T. Zhao, H. Liu, K. Roeder, J. Lafferty, and L. Wasserman. huge: 
High-dimensional undirected graph estimation, 2014. URL http://GRAN. 
R-project .org/package=huge, R package version 1.2.6. 


27 



